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Abstract 

We propose a graphical model for multi-view feature extraction that automatically 
adapts its structure to achieve better representation of data distribution. The pro- 
posed model, structure -adapting multi-view harmonium (SA-MVH) has switch 
parameters that control the connection between hidden nodes and input views, 
and learn the switch parameter while training. Numerical experiments on syn- 
thetic and a real- world dataset demonstrate the useful behavior of the SA-MVH, 
compared to existing multi-view feature extraction methods. 



1 Introduction 

Earlier multi-view feature extraction methods including canonical correlation analysis (T] and dual- 
wing harmonium (DWH) | 2 | assume that all views can be described using a single set of shared 
hidden nodes. However, these methods fail when real- world data with partially correlated views are 
given. More recent methods like factorized orthogonal latent space 1 3 1 or multi-view harmonium 
(MVH) m assume that views are generated from two sets of hidden nodes: view-specific hidden 
nodes and shared ones. Still, these models rely on the pre-defined connection structure, and deciding 
the number of shared and view- specific hidden nodes requires a great human effort. 

In this paper, we propose structure- adapting multi-view harmonium (SA-MVH) which avoids all 
of the problems mentioned above. Instead of explicitly defining view- specific and hidden nodes in 
prior to the training, we only use one set of hidden nodes and let each one of them to decide the 
existence of connection to views using switch parameters during the training. In this manner, SA- 
MVH automatically decides the number of view- specific latent variables and also captures partial 
correlation among views. 



2 The Proposed Model 

The definition of SA-MVH begins with choosing marginal distributions of visible node sets v^^^ 
and a set of hidden nodes h from exponential family distributions: 
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/(•), ^(•) are sufficient statistics, ^, A are natural parameters, and A, B are log-partition functions. 
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Figure 1: Graphical models of (a) dual- wing harmonium, (b) multi-view harmonium, and (c) 
structure-adapting multi-view harmonium. 



Connections between visible nodes and hidden nodes of SA-MVH are defined by weight matrices 
|^(^)} and switch parameters cr{skj) G [0, 1], where cr(-) is a sigmoid function. A switch Skj 
controls the connection between k-th view and j-th hidden node by being multiplied to the j-th 
column of weight matrix W''^^ (Figure[T]). When cr{skj) is large (> 0.5), we consider the view and 
the hidden node to be connected. With the quadratic term including weights and switch parameters, 
the joint distribution of SA-MVH is defined as below: 

h) oc exp(^ cr{s,,)wffl'^{vf'%{h,) - (^'f ) " E (2) 
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note that indices a and b are omitted to keep the notations uncluttered. 

We learn the parameters W^^\ ^"^^^ A, and switch parameters Skj by maximizing the likelihood of 
model via gradient ascent. The likelihood of SA-MVH is defined as the joint distribution of nodes 
summed over hidden nodes h\ 

h 

where {■)data represents expectation over data distribution. Then the gradient of log-Iikelihood with 
respect to the parameters W^''\ A, and Skj are derived as follows: 
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where {•) model represents expectation over model distribution p{{v^^^}^h) and = ^^^^ + 
Y^j cr{skj)w\fgj{hj), Xj = A^- + Y.k,i ^i^kj)w\f fi{v\^^) are shifted parameters. 



3 Numerical Experiments 

3.1 Feature Extraction on Noisy Arabic-Roman Digit Dataset 

To simulate the view- specific and shared properties of multi-view data, we designed a synthetic 
dataset which contains 1 1,800 pairs of Arabic digits and the corresponding Roman digits written in 
various fonts. For each pair, we added random vertical line noises to Arabic digits, and horizontal 
line noises to Roman digits (Figure|2]-(a)). SA-MVH trained with 200 hidden nodes found 95 shared 
features (with connection to both views), and 47 view- specific features for Roman digits, and 32 
for Arabic digits. Remaining 26 were not connected to any views and ignored. Most of the shared 
features were noise-free and encoded parts of Roman and Arabic numbers (Figure |2]-(b)). On the 
other hand, the view- specific features had components with horizontal or vertical noises, as well as 
the parts of the numbers (Figure |2]-(c)). In this example, SA-MVH automatically separated view- 
specific and shared information without any prior specification of the graph structure. 
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(a) 



(b) Shared features 



(c) View- specific features 



Figure 2: (a) 10 samples from Noisy Arabic-Roman digit dataset, (b) shared features, and (c) view- 
specific features learned by SA-MVH. 



Table 1 : Image classification accuracy of k-nn classifier using feature extraction methods trained on 
Caltech-256 dataset. For each value of k, the best result is marked as bold text. 



Method 


# 10-NN 


30-NN 


50-NN 


70-NN 


100-NN 


Sparse Filtering 


0.161 


0.165 


0.163 


0.16 


0.155 


DWH 


0.237 


0.231 


0.217 


0.207 


0.194 


MVH 


0.239 


0.225 


0.216 


0.203 


0.191 


SA-MVH 


0.246 


0.232 


0.223 


0.212 


0.198 



3.2 Image Classification on Caltech-256 Dataset 

We extracted 512 dimensions of GIST features and 1,536 dimensions of histogram of gradients 
(HoG) features from Caltech-256 dataset to simulate multi-view settings. SA-MVH and other multi- 
view feature extraction methods based on harmonium - DWH and MVH were trained on the dataset 
for comparison. We also compared our method to Sparse Filtering |5 |, which is not a harmonium- 
based method. We trained the feature extraction methods and tested the methods with k-nearest 
neighbor classifiers (Table [TJ. SA-MVH resulted better than other feature extraction models in this 
experiment, regardless of the value of k for nearest neighbor classifier. 



4 Conclusion 



In this paper, we have proposed the multi-view feature extraction model that automatically decides 
relations between latent variables and input views. The proposed method, SA-MVH models multi- 
view data distribution with less restrictive assumption and also reduces the number of parameters to 
tune by human hand. SA-MVH introduces switch parameters that control the connections between 
hidden nodes and input views, and find the desirable configuration while training. We have demon- 
strated the effectiveness of our approach by comparing our model to existing models in experiments 
on synthetic dataset, and image classification with simulated multi-view setting. 
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